大规模并行处理器编程：实践入门：大分野：计算性能的演变轨迹

这大分野标志着微处理器发展史上的一个根本性转折。在2001年至2009年间，CPU与GPU的性能轨迹开始分化，形成了巨大的能力鸿沟。当传统CPU遭遇 功耗墙——即提升时钟频率导致无法承受的热量——而GPU则凭借其庞大的游戏消费用户基础，转向了极致并行化的研发方向。 用户基数 来推动向极端并行架构的转型。

关键转折点

到2003年，差距开始拉大。CPU仍专注于顺序逻辑和低延迟优化，而GPU则将晶体管资源大量投入到 算术逻辑单元（ALUs）中。这使得GPU的性能从吉赫兹浮点运算（GFLOPS）跃升至 太赫兹浮点运算（Teraflops） 的吞吐量，而CPU的增长曲线则要平缓得多。

截至2009年，高端的英特尔i7-960处理器约提供70吉赫兹浮点运算性能，而英伟达GTX 280则达到了近933吉赫兹浮点运算。这不仅仅是速度的提升，更是一次计算方式的根本性重构，它优先考虑的是 吞吐量 而非单条指令的执行速度。

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

QUESTION 1

What primary constraint led to the 'Power Wall' for traditional CPUs?

The lack of available memory in the early 2000s.

Thermal and power limitations when increasing clock speeds.

A shortage of transistors on the silicon die.

The transition from 32-bit to 64-bit architectures.

QUESTION 2

According to the Great Divergence, which industry provided the economic engine for GPU R&D?

The Financial High-Frequency Trading market.

The Oil and Gas seismic exploration industry.

The Video Game industry.

The Cryptocurrency mining industry.

QUESTION 3

By 2009, how did the peak performance of an NVIDIA GTX 280 compare to an Intel Core i7-960?

They were roughly equal in throughput.

The CPU was twice as fast as the GPU.

The GPU was nearly an order of magnitude higher (~13x).

The GPU was 100x faster than the CPU.

QUESTION 4

GPUs achieve higher throughput by dedicating more transistors to which component?

Large Level-3 Caches.

Complex Branch Prediction logic.

Arithmetic Logic Units (ALUs).

Instruction Decoders.

QUESTION 5

What is the correct unit for measuring one trillion floating-point operations per second?

GFLOPS.

Teraflops.

Petaflops.

Megaflops.